Project Objective This web page was created as part of an assignment in Emory University's Biology 142 lab course. Students were assigned proteins to be researched, in order to learn about the function and structure of each protein, and determine whether the newly sequenced whale shark genome had evidence of the same proteins. This page is an analysis of LIG4.
Background LIG4 (or Ligase 4) is a protein that acts as a DNA ligase that joins single-stranded breaks in a double-stranded polydeoxynucleotide in an ATP-dependent reaction. It is utilized for recombination and DNA repair. Due to its role in recombination, it is associated with genetic variability within and between species. When defects occur in the gene coding for the LIG4 protein, this can trigger LIG4 syndrome, which is characterized by developmental and growth delay. Specifically, unusual facial features and skin abnormalities can occur (NCBI Gene).
The LIG4 protein is an essential component in joining together separated strands of DNA, specifically in the process called non-homologous end joining (NHEJ) which involves binding strands of DNA without the need for a homologous template (Wang et al. 2004), as shown in figure 1. NHEJ is needed when deleterious lesions occur, which lead to genomic instability or diversity (Bétermier et al. 2014).
Figure 1. Ligase 4 protein contributes to the process of non-homologous end joining, in which breaks in double-stranded DNA are repaired. Without this process, there can be genetic instability ("Non-Homologous End-Joining").
Methods Whale Shark OrthologsThe Galaxy server (whaleshark.georgiaaquarium.org) was utilized to run a BLAST against the predicted whale shark protein database and the human protein sequence (ENSP00000349393) was used as the query. The top five predicted protein hits were identified and the FASTA DNA sequences were extracted. The full predicted whale shark sequences, not just the aligned portion, were used as queries to BLAST against the NCBI human protein database (http://blast.ncbi.nlm.nih.gov). Predicted Orthologs LIG4 predicted orthologs were identified in species other than whale sharks using the NCBI Blast server. The human LIG4 protein (ENSP00000349393) was used as query sequence in protein BLASTs against mouse, zebrafish, clawed frogs, fruit flies, yeast and elephant shark protein databases. Phylogenetic Tree The human sequence was the query from which the other species' sequences were compare to. The top 5 hits of the whale shark predicted protein along with zebrafish, elephant shark, and mouse BLAST hits were used to create the phylogenetic tree, specifically using ClustalW2 to create a multiple sequence alignment. Results LIG4 in the Whale Shark The results of the BLAST performed in the Galaxy server of the whale shark database with the human protein sequence as the query are shown below in Table 1. Five hits each with an e-value below 4e-07 were identified. The best predicted hit had an e-value of 0.0.
Whale Shark ID
e-value
Alignment Length
Predicted Protein Length
% Identity
g40588.t1
0.0
635
911
75.75
g26119.t1
4e-08
76
911
35.53
g25015.t1
3e-08
52
911
37.14
g26186.t1
4e-07
71
911
30.99
g45065.t1
6e-15
196
911
27.55
Table 1. Human LIG4 top Blastp hits from whale shark predicted protein database. This was done using the Galaxy server from the Georgia Aquarium using the human protein sequence as the query. This table displays whale shark ID, alignment length, and % identity of the top 5 hits according to lowest e-values generated.
The predicted whale shark protein with an ID of g40588.t1 returned LIG4 as the best hit against the NCBI human protein database. The process was repeated using the elephant shark, mouse, and zebrafish, clawed frog, fruit fly, and yeast predicted LIG4 proteins to identify protein domains, identify more orthologs, and build a phylogenetic tree.
Protein domainsThe best hit predicted LIG4 proteins in the whale shark (from BLAST results) contain three domains, which are a DNA_ligase_A_N domain, a Adenylation domain and a OBF domain(Figure 2). DNA_ligase_A_N domains are found in many but not all ATP-dependent DNA ligase enzymes. In some species such as human and yeasts, DNA_ligase_A_N domains are necessary for DNA binding and for catalysis (Sekiguchi and Shuman 1997). The Adenylation domain is responsible for the specific recognition of amino acids and activation as adenylyl amino acids. The reaction catalysed is aa + ATP to aa-AMP + PPi (Konz and Marahiel 1999). The OBF domain catalyzes phosphodiester bond formation using nicked nucleic acid substrates with the high energy nucleotide of ATP. A protein essential for ATP hydrolysis is located in the OB-fold domain (Martin and MacNeill 2002).
Screen Shot 2015-04-11 at 5.51.49 PM.png
Figure 2. Putative domains of the whale shark LIG4 best hit predicted protein. The best-hit whale shark predicted protein contains putative DNA_ligase_A_N, Adenylation and OBF domains as predicted by NCBI BLAST server analyses.
Homologs/Orthologs The human LIG4 protein sequence (ENSP00000349393) was used as query in NCBI BLAST searches against protein databases of mice, zebra fish, clawed frogs, fruit flies, yeast and elephant shark. The best hits for each species were listed in Table 2. LIG4 orthologues were found in all of these species. LIG4 homologues were identified in the species of mouse and elephant shark due to the alignments across the entire sequence.
Species
Name
ID
Length
Query coverage
E-value
Identity
Homo sapiens
DNA ligase 4 [Homo sapiens]
NP_002303.2
911
N/A
N/A
N/A
Mouse
DNA ligase 4 [Mus musculus]
NP_795927.2
911
100%
0.0
87%
Elephant shark
PREDICTED: DNA ligase 4 [Callorhinchus milii]
XP_007891664.1
914
100%
0.0
69%
Zebra fish
PREDICTED: DNA ligase 4 isoform X1 [Danio rerio]
XP_005160015.1
909
99%
0.0
64%
Clawed frog
DNA ligase 4 [Xenopus (Silurana) tropicalis]
NP_001016981.1
911
99%
0.0
73%
fruit fly
Ligase4 [Drosophila melanogaster]
NP_572907.1
918
89%
2.00E-123
31%
Yeast
DNA ligase (ATP) DNL4 [Saccharomyces cerevisiae S288c]
NP_014647.1
944
96%
1.00E-79
26%
Table 2. Best hits with human LIG4 protein BLAST. The human LIG4 sequence was used in protein BLASTs against individual species. Name, ID, length, Query coverage, E-value and Identity of the best hit from each search are reported here. Phylogeny The best hits from each of the BLAST searches of the protein databases were used in the ClustalW2 program to create the phylogenetic tree. Three of the top whale shark hits (g26186.t1, g26119.t1, and g45065.t1) are the oldest in relation to the other species’ sequences, while the human, mouse, clawed frog, and elephant shark are the youngest on the tree. The g40588.t1 sequence would be a very good ID for the LIG4 protein since it is closest in relation to the elephant shark, which is known to share a large portion of its DNA with the whale shark, and additionally, it had the lowest possible e-value and a high alignment length.
LIG4 with 5 best hits.png
Figure 2. The phylogenetic tree was created using the ClustalW2 program to align the top hits of the BLAST searches and other species' sequences based on evolutionary age.
Conclusion We were able to identify predicted LIG4 orthologs in whale sharks. This was done by using the Blast software to compare the human protein sequence to the predicted whale shark protein sequence. The best five whale shark ID sequences were identified to be g40588.t1, g26119.t1, g25015.t1, g26186.t1, and g45065.t1. The top hit was g40588.t1 with an e-value of 0.0. More orthologs were found using the NCBI database using the human protein sequence as the query for the mouse, zebrafish, clawed frog, fruit fly, yeast, and elephant shark databases. Finally, these sequences were used to construct a phylogenetic tree that depicted evolution of the protein. The top hit whale shark protein tag of g40588.t1 was found to be most closely related to the elephant shark, which has previously been identified as the whale shark's closest known relative. More research need to be done to find specifically where the gene that codes for the LIG4 protein falls in the overall whale shark genome. This is important because of LIG4’s critical role in non-homologus end-joining, recombination, and genetic variation.
References
Bétermier, M., Bertrand, P., and Lopez, B.S. (2014). Is Non-Homologous End-Joining Really an Inherently Error-Prone Process? PLOS Genetics. DOI: 10.1371/journal.pgen.1004086
Davis, A.J. and Chen, D.J. (2013). DNA double strand break repair via non-homologous end-joining. Translational Cancer Research 2(3):130-143.
Girard, P.M., Kysela, B., Harer, C.J., Doherty, A.J., and Jeggo, P.A. (2004). Analysis of DNA ligase IV mutations found in LIG4 syndrome patients: the impact of two linked polymorphisms. Human Molecular Genetics 13(20):2369-2376. DOI:10.1093/hmg/ddh274
Wang Y., Nnakwe, C., Lane, W.S., Modesti, M., and Frank, K.M. (2004). Phosphorylation and Regulation of DNA Ligase IV Stability by DNA-dependent Protein Kinase. The Journal of Biological Chemistry 279: 37282-37290.
Konz D, Marahiel MA. How do peptide synthetases generate structural diversity? Chem Biol. 1999 Feb;6(2) R39-48. doi:10.1016/s1074-5521(99)80002-7. PubMed PMID: 10021423.
Sekiguchi J, Shuman S. Domain structure of vaccinia DNA ligase. Nucleic Acids Research. 1997;25(4):727-734.
Martin IV, MacNeill SA. ATP-dependent DNA ligases. Genome Biology. 2002;3(4):reviews3005.1-reviews3005.7.
This web page was created as part of an assignment in Emory University's Biology 142 lab course. Students were assigned proteins to be researched, in order to learn about the function and structure of each protein, and determine whether the newly sequenced whale shark genome had evidence of the same proteins. This page is an analysis of LIG4.
Background
LIG4 (or Ligase 4) is a protein that acts as a DNA ligase that joins single-stranded breaks in a double-stranded polydeoxynucleotide in an ATP-dependent reaction. It is utilized for recombination and DNA repair. Due to its role in recombination, it is associated with genetic variability within and between species. When defects occur in the gene coding for the LIG4 protein, this can trigger LIG4 syndrome, which is characterized by developmental and growth delay. Specifically, unusual facial features and skin abnormalities can occur (NCBI Gene).
The LIG4 protein is an essential component in joining together separated strands of DNA, specifically in the process called non-homologous end joining (NHEJ) which involves binding strands of DNA without the need for a homologous template (Wang et al. 2004), as shown in figure 1. NHEJ is needed when deleterious lesions occur, which lead to genomic instability or diversity (Bétermier et al. 2014).
Figure 1. Ligase 4 protein contributes to the process of non-homologous end joining, in which breaks in double-stranded DNA are repaired. Without this process, there can be genetic instability ("Non-Homologous End-Joining").
Methods
Whale Shark OrthologsThe Galaxy server (whaleshark.georgiaaquarium.org) was utilized to run a BLAST against the predicted whale shark protein database and the human protein sequence (ENSP00000349393) was used as the query. The top five predicted protein hits were identified and the FASTA DNA sequences were extracted. The full predicted whale shark sequences, not just the aligned portion, were used as queries to BLAST against the NCBI human protein database (http://blast.ncbi.nlm.nih.gov).
Predicted Orthologs
LIG4 predicted orthologs were identified in species other than whale sharks using the NCBI Blast server. The human LIG4 protein (ENSP00000349393) was used as query sequence in protein BLASTs against mouse, zebrafish, clawed frogs, fruit flies, yeast and elephant shark protein databases.
Phylogenetic Tree
The human sequence was the query from which the other species' sequences were compare to. The top 5 hits of the whale shark predicted protein along with zebrafish, elephant shark, and mouse BLAST hits were used to create the phylogenetic tree, specifically using ClustalW2 to create a multiple sequence alignment.
Results
LIG4 in the Whale Shark
The results of the BLAST performed in the Galaxy server of the whale shark database with the human protein sequence as the query are shown below in Table 1. Five hits each with an e-value below 4e-07 were identified. The best predicted hit had an e-value of 0.0.
The predicted whale shark protein with an ID of g40588.t1 returned LIG4 as the best hit against the NCBI human protein database. The process was repeated using the elephant shark, mouse, and zebrafish, clawed frog, fruit fly, and yeast predicted LIG4 proteins to identify protein domains, identify more orthologs, and build a phylogenetic tree.
Protein domainsThe best hit predicted LIG4 proteins in the whale shark (from BLAST results) contain three domains, which are a DNA_ligase_A_N domain, a Adenylation domain and a OBF domain(Figure 2). DNA_ligase_A_N domains are found in many but not all ATP-dependent DNA ligase enzymes. In some species such as human and yeasts, DNA_ligase_A_N domains are necessary for DNA binding and for catalysis (Sekiguchi and Shuman 1997). The Adenylation domain is responsible for the specific recognition of amino acids and activation as adenylyl amino acids. The reaction catalysed is aa + ATP to aa-AMP + PPi (Konz and Marahiel 1999). The OBF domain catalyzes phosphodiester bond formation using nicked nucleic acid substrates with the high energy nucleotide of ATP. A protein essential for ATP hydrolysis is located in the OB-fold domain (Martin and MacNeill 2002).
Figure 2. Putative domains of the whale shark LIG4 best hit predicted protein. The best-hit whale shark predicted protein contains putative DNA_ligase_A_N, Adenylation and OBF domains as predicted by NCBI BLAST server analyses.
Homologs/Orthologs
The human LIG4 protein sequence (ENSP00000349393) was used as query in NCBI BLAST searches against protein databases of mice, zebra fish, clawed frogs, fruit flies, yeast and elephant shark. The best hits for each species were listed in Table 2. LIG4 orthologues were found in all of these species. LIG4 homologues were identified in the species of mouse and elephant shark due to the alignments across the entire sequence.
Phylogeny
The best hits from each of the BLAST searches of the protein databases were used in the ClustalW2 program to create the phylogenetic tree. Three of the top whale shark hits (g26186.t1, g26119.t1, and g45065.t1) are the oldest in relation to the other species’ sequences, while the human, mouse, clawed frog, and elephant shark are the youngest on the tree. The g40588.t1 sequence would be a very good ID for the LIG4 protein since it is closest in relation to the elephant shark, which is known to share a large portion of its DNA with the whale shark, and additionally, it had the lowest possible e-value and a high alignment length.
Figure 2. The phylogenetic tree was created using the ClustalW2 program to align the top hits of the BLAST searches and other species' sequences based on evolutionary age.
Conclusion
We were able to identify predicted LIG4 orthologs in whale sharks. This was done by using the Blast software to compare the human protein sequence to the predicted whale shark protein sequence. The best five whale shark ID sequences were identified to be g40588.t1, g26119.t1, g25015.t1, g26186.t1, and g45065.t1. The top hit was g40588.t1 with an e-value of 0.0. More orthologs were found using the NCBI database using the human protein sequence as the query for the mouse, zebrafish, clawed frog, fruit fly, yeast, and elephant shark databases. Finally, these sequences were used to construct a phylogenetic tree that depicted evolution of the protein. The top hit whale shark protein tag of g40588.t1 was found to be most closely related to the elephant shark, which has previously been identified as the whale shark's closest known relative. More research need to be done to find specifically where the gene that codes for the LIG4 protein falls in the overall whale shark genome. This is important because of LIG4’s critical role in non-homologus end-joining, recombination, and genetic variation.
References